Skip to content

Conversation

@liancheng
Copy link
Contributor

Now we also report path and schema of the file in trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just use a foreach on footers and then combine all of these in a single pass? Seems simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. something like

var mergedSchema = StructType(Nil)
footers.foreach { file =>
  val schema = ParquetRelation.readSchemaFromFooter(footer, converter)
  try {
    mergedSchema = mergedSchema.merge(schema)
  } catch {
    ...
  }
}
mergedSchema

@liancheng liancheng force-pushed the schema-merging-failure-message branch from 0ff9651 to bfe4987 Compare January 29, 2016 02:31
@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50331 has finished for PR 10972 at commit bfe4987.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the schema from first footer and then go through this loop for remaining footers? Because you merge the first schema with an empty schema, I think the all fields in merged schema will be optional in their metadata. So the pushing down of filters will not normally work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you'll right, filter push-down can be affected due to #9940 (which I just merged today). Thanks for pointing this out!

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50327 has finished for PR 10972 at commit 1ce61a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we catch something tighter here? This is too broad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i'd prefer to make this more concise, e.g.

                  throw new SparkException(
                    "Failed merging schema of file ${footer.getFile}: ${schema.treeString}", cause)

I don't know why these functions are super long.

@rxin
Copy link
Contributor

rxin commented Jan 29, 2016

@viirya do you want to submit a pull request to address your issue? @liancheng is busy with something right now.

i.e. create a pull request that contains this one and set the 1st

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50354 has finished for PR 10972 at commit 2e5eddb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

@viirya It would be great if you can help since you are pretty familiar with this part of code :)

@liancheng
Copy link
Contributor Author

I'm closing this one.

@liancheng liancheng closed this Jan 29, 2016
@viirya
Copy link
Member

viirya commented Jan 29, 2016

@rxin @liancheng yes, I would love to do it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants